Search CORE

285 research outputs found

Neural Empirical Bayes

Author: Hyvärinen Aapo
Saremi Saeed
Publication venue
Publication date: 01/01/2019
Field of study

Peer reviewe

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Helsingin yliopiston digitaalinen arkisto

HAL-CEA

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

Author: Hyvärinen Aapo
Morioka Hiroshi
Publication venue
Publication date: 24/10/2023
Field of study

A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles

arXiv.org e-Print Archive

Estimation of Non-Normalized Mixture Models and Clustering Using Deep Representation

Author: Hyvärinen Aapo
Matsuda Takeru
Publication venue: Journal of Machine Learning Research
Publication date: 19/05/2018
Field of study

Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

A mixture of sparse coding models explaining properties of face neurons related to holistic and parts-based processing

Author: Hosoya Haruo
Hyvärinen Aapo
Publication venue
Publication date: 01/07/2017
Field of study

Experimental studies have revealed evidence of both parts-based and holistic representations of objects and faces in the primate visual system. However, it is still a mystery how such seemingly contradictory types of processing can coexist within a single system. Here, we propose a novel theory called mixture of sparse coding models, inspired by the formation of category-specific subregions in the inferotemporal (IT) cortex. We developed a hierarchical network that constructed a mixture of two sparse coding submodels on top of a simple Gabor analysis. The submodels were each trained with face or non-face object images, which resulted in separate representations of facial parts and object parts. Importantly, evoked neural activities were modeled by Bayesian inference, which had a top-down explaining-away effect that enabled recognition of an individual part to depend strongly on the category of the whole input. We show that this explaining-away effect was indeed crucial for the units in the face submodel to exhibit significant selectivity to face images over object images in a similar way to actual face-selective neurons in the macaque IT cortex. Furthermore, the model explained, qualitatively and quantitatively, several tuning properties to facial features found in the middle patch of face processing in IT as documented by Freiwald, Tsao, and Livingstone (2009). These included, in particular, tuning to only a small number of facial features that were often related to geometrically large parts like face outline and hair, preference and anti-preference of extreme facial features (e.g., very large/small inter-eye distance), and reduction of the gain of feature tuning for partial face stimuli compared to whole face stimuli. Thus, we hypothesize that the coding principle of facial features in the middle patch of face processing in the macaque IT cortex may be closely related to mixture of sparse coding models.Peer reviewe

Directory of Open Access Journals

UCL Discovery

Helsingin yliopiston digitaalinen arkisto

FigShare

A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional Data

Author: Hyvärinen Aapo
Monti Ricardo Pio
Publication venue: AUAI Press
Publication date: 24/05/2018
Field of study

Peer reviewe

arXiv.org e-Print Archive

UCL Discovery

Helsingin yliopiston digitaalinen arkisto

Density Estimation in Infinite Dimensional Exponential Families

Author: Fukumizu Kenji
Gretton Arthur
Hyvärinen Aapo
Kumar Revant
Sriperumbudur Bharath
Publication venue
Publication date: 26/05/2017
Field of study

In this paper, we consider an infinite dimensional exponential family,

\mathcal{P}

of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space,

H

and show it to be quite rich in the sense that a broad class of densities on

\mathbb{R}^d

can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in

\mathcal{P}

. The main goal of the paper is to estimate an unknown density,

p_0

through an element in

\mathcal{P}

. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between

p_0

and

\mathcal{P}

, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. Instead, we propose an estimator,

\hat{p}_n

based on minimizing the \emph{Fisher divergence},

J(p_0\Vert p)

between

p_0

and

p\in \mathcal{P}

, which involves solving a simple finite-dimensional linear system. When

p_0\in\mathcal{P}

, we show that the proposed estimator is consistent, and provide a convergence rate of

n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}

in Fisher divergence under the smoothness assumption that

\log p_0\in\mathcal{R}(C^\beta)

for some

\beta\ge 0

, where

C

is a certain Hilbert-Schmidt operator on

H

and

\mathcal{R}(C^\beta)

denotes the image of

C^\beta

. We also investigate the misspecified case of

p_0\notin\mathcal{P}

and show that

J(p_0\Vert\hat{p}_n)\rightarrow \inf_{p\in\mathcal{P}}J(p_0\Vert p)

n\rightarrow\infty

, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage with the proposed estimator grows as

d

increases.Comment: 58 pages, 8 figures; Fixed some errors and typo

arXiv.org e-Print Archive

UCL Discovery

The Optimal Noise in Noise-Contrastive Learning Is Not What You Think

Author: Chehab Omar
Gramfort Alexandre
Hyvärinen Aapo
Publication venue: Association For Uncertainty in Artificial Intelligence (AUAI)
Publication date: 01/01/2022
Field of study

Publisher Copyright: © 2022 Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022. All right reserved.Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion; this setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.Peer reviewe

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Helsingin yliopiston digitaalinen arkisto

HAL-CEA

Sparse Linear Identifiable Multivariate Modeling

Author: Aapo Hyvärinen
Dtu Informatics
Ole Winther
Ricardo Henao
Richard Petersens Plads
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/.Comment: 45 pages, 17 figure

arXiv.org e-Print Archive

CiteSeerX

Online Research Database In Technology